Approximate Dimension Equalization in Vector-based Information Retrieval

نویسندگان

  • Fan Jiang
  • Michael L. Littman
چکیده

Vector-based information retrieval methods such as the vector space model (VSM), latent semantic indexing (LSI), and the generalized vector space model (GVSM) represent both queries and documents by high-dimensional vectors learned from analyzing a training corpus of text. VSM scales well to large collections, but cannot represent term–term correlations, which prevents it from being used in translingual retrieval. GVSM and LSI can represent term–term correlations, but do not scale well to very large retrieval collections. We present a novel method we call approximate dimension equalization (ADE) that combines ideas from VSM, LSI, and GVSM to produce a method that performs well on large collections, scales well computationally, and can represent term–term correlations. We compare the performance of ADE to the other methods on both large and small collections of both monolingual and bilingual text. ADE outperforms all other methods on large bilingual collections, and performs close to the best in all other cases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System

This paper describes a vector space equalization scheme for a concept-based collaborative information retrieval system; evaluation results are given. The authors previously proposed a peer-to-peer information exchange system that aims at smooth knowledge and information management to activate organizations and communities. One problem with the system arises when information is retrieved from an...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Approximate Dimension Reduction at NTCIR

We carried out a comparison of cross-language retrieval methods on the NTCIR-1 data based on dimension reduction (latent semantic indexing). These methods all use a collection parallel documents (translations or approximate translations) and very little, if any, linguistic knowledge. In NTCIR-1, we compared latent semantic indexing, local LSI, and approximate dimensional equalization (ADE). We ...

متن کامل

A Review of Image Enhancement Technique Based on Wavelet Threshold and Neural Network

Image enhancement plays an important role in computer vision. The degraded image, blurred image and noised image effect the medical diagnosis of image data, satellite image for information retrieval. Various authors and researcher propose a method of image enhancement such as histogram equalization, multi-point histogram equalisation and some method based on neural network and wavelet threshold...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000